1. Understand the Dataset

The dataset includes: - ride_id: Unique identifier for each ride. - rideable_type: Type of bike used. - started_at and ended_at: Start and end times of each trip. - start_station_name and end_station_name: Start and end stations of the trip. - start_lat and start_lng, end_lat and end_lng: Latitude and longitude of start and end points. - member_casual: Indicates whether the rider is a “member” or “casual”.

2. Data Preparation

2.1 Check and Clean the Data

Check for Missing Values

# Check for missing values
colSums(is.na(cleaned_dec_23_tripdata))
##                  X            ride_id      rideable_type         started_at 
##                  0                  0                  0                  0 
##           ended_at start_station_name   start_station_id   end_station_name 
##                  0                  0                  0                  0 
##     end_station_id          start_lat          start_lng            end_lat 
##                  0                  0                  0                  0 
##            end_lng      member_casual      trip_duration        day_of_week 
##                  0                  0                  0                  0 
##        hour_of_day 
##                  0

Remove Missing Values

# Remove rows with missing values
cleaned_dec_23_tripdata <- na.omit(cleaned_dec_23_tripdata)

Ensure Proper Date/Time Format

# Convert date/time fields to proper format
cleaned_dec_23_tripdata$started_at <- as.POSIXct(cleaned_dec_23_tripdata$started_at, format = "%Y-%m-%d %H:%M:%S")
cleaned_dec_23_tripdata$ended_at <- as.POSIXct(cleaned_dec_23_tripdata$ended_at, format = "%Y-%m-%d %H:%M:%S")

Remove Duplicate Records

# Remove duplicate rows using dplyr
cleaned_dec_23_tripdata <- distinct(cleaned_dec_23_tripdata)

2.2 Add New Calculated Columns

Trip Duration

# Calculate trip duration in minutes
cleaned_dec_23_tripdata$trip_duration <- as.numeric(difftime(
  cleaned_dec_23_tripdata$ended_at,
  cleaned_dec_23_tripdata$started_at,
  units = "mins"
))

Day of Week

# Extract the day of the week
cleaned_dec_23_tripdata$day_of_week <- weekdays(cleaned_dec_23_tripdata$started_at)

Hour of the Day

# Extract the hour from the start time
cleaned_dec_23_tripdata$hour_of_day <- as.numeric(format(cleaned_dec_23_tripdata$started_at, "%H"))

3. Analyze the Data

3.1 Aggregate Metrics by User Type

Total Number of Rides by User Type

## 
## casual member 
##  36686 130457

Average Trip Duration by User Type

##   member_casual trip_duration
## 1        casual      16.53441
## 2        member      10.80303

3.2 Analyze Usage Patterns

Bike Type Preferences for Casual and Member riders

##                
##                 casual member
##   classic_bike   20280  84044
##   electric_bike  16406  46413

4. Visualize Insights

Bar Chart: Total Rides by User Type

Bar graph: Peak Hours by User Type

## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_bin()`).

#### Heat map: peak Hours by User Type

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_tile()`).

5. Summarize and Recommend

Key Findings

  • Usage Trends: Members take more rides overall but have shorter trip durations on average.
  • Casual Riders: Ride primarily on weekends and during leisure hours.
  • Members: Use bikes more consistently across the week, especially during commute hours.

Recommendations

  • Design weekend promotions targeting casual riders.
  • Highlight the benefits of annual memberships for regular commuting needs.
  • Increase marketing efforts at popular casual rider stations to convert them to members.